Inference for the mean of paired differences: Paired t-test

DATAX121-23A (HAM) & (SEC) - Introduction to Statistical Methods

Learning Outcomes

  • Quantifying the uncertainty for the sample mean of the paired differences
  • How to construct and interpret a confidence interval for the population mean of paired differences
  • How to conduct and interpret a hypothesis test for the population mean of paired differences

Diff from one sample

Why do we analyse paired differences?

Context One

In a study to determine whether the colour red increases how attractive men find women, one group of men rate the attractiveness of a woman after seeing her picture on a red background and another group of men rate the same woman after seeing her picture on a white background.

Context Two

To measure the effectiveness of a new teaching method for math in elementary school, each student in a class getting the new instructional method is matched with a student in a separate class on IQ, family income, math ability level the previous year, reading level, and all demographic characteristics. At the end of the year, math ability levels are measured again.

Wait a minute…

Sampling distribution of Diff

If the population mean of the paired differences, \(\mu_\text{Diff}\), and the population standard deviation of the paired differences, \(\sigma_\text{Diff}\), are known—The ground “truths” (parameters) that summarise all possible values we could observe

The sampling distribution of the sample mean of the paired differences, \(\bar{x}_\text{Diff}\), is

\[ \bar{x}_\text{Diff} ~ \text{approx.} ~ \text{Normal} \! \left(\mu_{\bar{x}_\text{Diff}} = \mu_\text{Diff}, \sigma_{\bar{x}_\text{Diff}} = \frac{\sigma_\text{Diff}}{\sqrt{n}} \right) \]

The use of the \(\bar{x}_\text{Diff}\) subscripts is to make it clear that we are talking about the sampling distribution of \(\bar{x}_\text{Diff}\) and not the possible values we could observe

Assumptions for inference μDiff

  1. Independent observations—typically met with random samples or randomisation of the data collection order with randomised experiments
  2. Unimodal—one peak
  3. Approximately symmetrical about the sample mean, \(\bar{x}_\text{Diff}\), and there are no outliers

More on 1.

We are not saying that the two related numeric variables are independent of one another. Instead, we are saying that the paired differences of the related two related numeric variables are independent

Definition: se(Diff)

The standard error of the sample mean of the paired differences, \(\bar{x}_\text{Diff}\), is

\[_ \text{se}(\bar{x}_\text{Diff}) = \frac{s_\text{Diff}}{\sqrt{n}} \]

where:

  • \(s_\text{Diff}\) is the sample standard deviation of the paired differences
  • \(n\) is the number of observations

CS 6.1: Wetsuit study

Can a certain swimsuit really make a swimmer faster? A study tested whether wearing wetsuits influences swimming velocity. Twelve competitive swimmers and triathletes swam 1500 metres at their maximum speed each, once wearing a wetsuit and once wearing a regular bathing suit.

The order of the trials was randomised. Each time, the maximum velocity in metres per second (ms-1) of the swimmer was recorded.

Variables
Athlete An integer denoting which athlete
Wetsuit A number denoting the athlete’s maximum swim velocity when wearing a wetsuit (in ms-1)
NoWetsuit A number denoting the athlete’s maximum swim velocity when wearing a regular bathing suit (in ms-1)
wetsuits.df <- read.csv("datasets/wetsuits.csv")

# R code for demonstration purposes
library(tidyr) 
wetsuits.df |>
  pivot_longer(Wetsuit:NoWetsuit, names_to = "Type", 
               values_to = "Velocity") |>
  stripplot(Type ~ Velocity, data = _, jitter.data = TRUE, 
    xlab = "Maximum swim velocity (m/s)", cex = 1.25,
    main = "Distribution of maximum swim velocities by gear")

Figure: The maximum swimming velocities of the 12 athletes by the type of swimming gear used

CS 6.1: Wetsuit study

Are all three assumptions met?

\(\bar{x}_\text{Diff} =\) \(0.0775\) (4 dp)
\(s_\text{Diff} =\) \(0.0218\) (4 dp)
\(n =\) \(12\)

wetsuits.df$Diff <- wetsuits.df$Wetsuit - wetsuits.df$NoWetsuit

summary(wetsuits.df$Diff)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0500  0.0575  0.0800  0.0775  0.1000  0.1100 
sd(wetsuits.df$Diff)
[1] 0.02179449
nrow(wetsuits.df)
[1] 12
stripplot( ~ Diff, data = wetsuits.df, jitter.data = TRUE, cex = 1.25, factor = 5,
  xlab = "Difference in maximum swim velocity, wetsuit - bathing suit, (m/s)", 
  main = "Distribution of paired differences in maximum swim velocities")

Figure: The differences maximum swimming velocities of the 12 athletes by the type of swimming gear used

A confidence interval for μDiff

Definition: (1 - α)% Confidence interval for μDiff

\[ \bar{x}_\text{Diff} \pm t^*_{1-\alpha/2}(\nu) \times \text{se}(\bar{x}_\text{Diff}) \]

where:

  • \(\bar{x}_\text{Diff}\) is the sample mean of the paired differences
  • \(n\) is the number of observations
  • The confidence level is \((1 - \alpha)\), where \(\alpha\) is a proportion
  • The degrees of freedom, \(\nu\)
    • For a \((1 - \alpha)\) C.I. for \(\mu_\text{Diff}\), we set this to \(\nu = n - 1\)
  • \(t^*_{1-\alpha/2}(\nu)\) is the t-multiplier for the prescribed confidence level of \((1 - \alpha)\)
  • \(\text{se}(\bar{x}_\text{Diff})\) is the standard error of \(\bar{x}_\text{Diff}\)—see Slide 8

CS 6.1: Wetsuit study

Recall that we calculated the paired differences as Wetsuit−NoWetsuit \[ \bar{x}_\text{Diff} = 0.0775 ~ \text{ms}^{\text{-1}}, \quad s = 0.0218 ~ \text{ms}^{\text{-1}}, \quad n = 12, \quad t^\ast_{0.975}(11) = 2.20 \]

0.0062931, 0.0636551, 0.0913449

Interpretation of a confidence interval for μDiff

For CS 6.1, the 95% confidence interval for the underlying (population) mean of the paired differences was (0.06365514, 0.09134486).

Attempt One…

We are 95% sure that the underlying mean of the difference between wetsuits and bathing suits for competitive swimmers and triathletes was somewhere between 0.064 and 0.092 metres per second in favour of wetsuits

Attempt Two…

We are 95% confident that for competitive swimmers and triathletes, wetsuits increase maximum swimming velocity by an average of somewhere between 0.064 and 0.092 metres per second relative to bathing suits

A hypothesis test for μDiff

Definition: The test statistic for μDiff

\[ t_0 = \frac{\bar{x}_\text{Diff} - \mu_{\text{Diff}0}}{\text{se}(\bar{x}_\text{Diff})} \]

where:

  • \(t_0\) is the T-test statistic (for μDiff)
  • \(\bar{x}_\text{Diff}\) is the sample mean of the paired differences
  • \(\mu_{\text{Diff}0}\) is the hypothesised value of the population mean of the paired differences
  • \(\text{se}(\bar{x}_\text{Diff})\) is the standard error of \(\bar{x}_\text{Diff}\)—see Slide 8

CS 6.1: Wetsuit study

Recall that we calculated the paired differences as Wetsuit−NoWetsuit \[ \bar{x}_\text{Diff} = 0.0775 ~ \text{ms}^{\text{-1}}, \quad s = 0.0218 ~ \text{ms}^{\text{-1}}, \quad n = 12, \quad t^\ast_{0.975}(11) = 2.20 \]

0.0062931, 12.3150401

Interpretation of a hypothesis test for μDiff

For CS 6.1, 1.163284×10-7 was the exact p-value for the following set of hypothesis statements

\(\phantom{\bullet} H_0\!: \mu_\text{Diff} = 0\)
\(\phantom{\bullet} H_1\!: \mu_\text{Diff} \neq 0\)

CS 6.2: The freshman 15 effect

CS 6.2: The freshman 15 effect

Is it true that students tend to gain weight during their first year in university? A US professor recruited freshman students (first-years) from a large introductory health paper. Students were weighed during the first week of the semester and then again 12 weeks later.

The professor hypothesised that students gained, on average, 15 pounds over this period (the freshman 15 effect).

Variables
Subject An integer denoting the anonymised student identifier
Initial.Weight A number denoting the initial weight of the student (in pounds)
Terminal.Weight A number denoting the weight of the student in week 12 (in pounds)
Weight.Diff A number denoting the difference between Terminal.Weight and Initial.Weight
freshman.df <- read.csv("datasets/freshman.csv")
bwplot(~ Weight.Diff, data = freshman.df, pch = "|", 
  main = "Distribution of paired weight differences",
  xlab = "Weight difference between weeks twelve and one (pounds)")

Figure: The paired weight differences of 68 students

CS 6.2: Checking assumptions

Independence
The professor recruited students, and it was not described how they recruited them, simple random sampling, choice sampling, self-selection etc. Therefore, no clear evidence that the independence assumption has been met

Unimodal
Not very clear from the boxplot. However, the histogram of the paired differences clearly showed that it was unimodal

Approximately symmetrical about the sample mean, \(\bar{x}_\text{Diff}\), and there are no outliers
\(\bar{x}_{\text{Wk.12}-\text{Wk.1}} = 1.91\)

The distribution of the paired differences is approximately symmetrical about its sample mean

Figure: The paired weight differences of 68 students

Figure: The paired weight differences of 68 students

CS 6.2: The analysis with t.test()

What are the null and alternative hypothesis statements?

If the paired differences were
calculated prior:

t.test(Weight.Diff ~ 1, mu = 15,
       data = freshman.df)

    One Sample t-test

data:  Weight.Diff
t = -50.712, df = 67, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 15
95 percent confidence interval:
 1.396621 2.426909
sample estimates:
mean of x 
 1.911765 

If the paired differences were not calculated prior:

t.test(Pair(Terminal.Weight, Initial.Weight) ~ 1, mu = 15,
       data = freshman.df)

    Paired t-test

data:  Pair(Terminal.Weight, Initial.Weight)
t = -50.712, df = 67, p-value < 2.2e-16
alternative hypothesis: true mean difference is not equal to 15
95 percent confidence interval:
 1.396621 2.426909
sample estimates:
mean difference 
       1.911765 

Note the use of the Pair() function to wrap both numeric variables

CS 6.2: Interpretation of output

95% CI for \(\mu_{\text{Wk.12}-\text{Wk.1}}\)
We are 95% sure that students gain, on average, somewhere between 1.4 and 2.4 pounds by Week 12 compared to Week 1

Hypothesis Test for \(\mu_{\text{Wk.12}-\text{Wk.1}} = 15\)
We have very strong evidence against the null that the students’ underlying mean weight difference between Weeks twelve and one is equal to 15 pounds, in favour of the alternative that it is not (p-value ≈ 0)

t.test(Weight.Diff ~ 1, mu = 15, data = freshman.df)

    One Sample t-test

data:  Weight.Diff
t = -50.712, df = 67, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 15
95 percent confidence interval:
 1.396621 2.426909
sample estimates:
mean of x 
 1.911765 

CS 6.2: Is independence necessary?

It was found that the professor’s subjects were self-selected for the study